-
Notifications
You must be signed in to change notification settings - Fork 36
Feat/external eval server #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/external eval server #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive external evaluation server system for testing AI Chat agent tools in DevTools. The feature enables external evaluation of AI Chat tools via WebSocket RPC communication between DevTools and a standalone evaluation server.
- External Evaluation Infrastructure: Complete WebSocket-based evaluation server with client management, authentication, and tool execution
- DevTools Integration: New evaluation configuration UI in Settings Dialog with connection management and testing capabilities
- Comprehensive Test Suite: 25+ evaluation test cases covering web task agents, schema extractors, and various real-world scenarios
Reviewed Changes
Copilot reviewed 114 out of 118 changed files in this pull request and generated 7 comments.
Show a summary per file
File | Description |
---|---|
front_end/panels/ai_chat/ui/SettingsDialog.ts | Adds evaluation configuration UI with connection status, endpoint settings, and test functionality |
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts | Implements WebSocket client for receiving and executing evaluation requests from server |
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts | Defines message protocol for client-server evaluation communication |
front_end/panels/ai_chat/common/EvaluationConfig.ts | Manages evaluation configuration, connection state, and localStorage persistence |
front_end/panels/ai_chat/common/WebSocketRPCClient.ts | Generic WebSocket RPC client with reconnection and error handling |
eval-server/src/server.js | Main evaluation server with client management, authentication, and evaluation orchestration |
eval-server/evals/web-task-agent/*.yaml | Comprehensive test cases for web task agent covering social media, search, booking, finance, etc. |
front_end/panels/ai_chat/**/ErrorHandlingUtils.ts | Improves error logging consistency across multiple files |
Files not reviewed (1)
- eval-server/package-lock.json: Language not supported
|
||
// Set up periodic status updates every 2 seconds | ||
const statusUpdateInterval = setInterval(updateConnectionStatus, 2000); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status update interval is created but never cleaned up when the dialog is closed or destroyed. This could lead to memory leaks and unnecessary function calls. Consider storing the interval ID and clearing it in a cleanup function.
// Ensure the interval is cleared when the dialog is closed | |
this.addEventListener('close', () => { | |
clearInterval(statusUpdateInterval); | |
}); |
Copilot uses AI. Check for mistakes.
const clientIdInput = document.createElement('input'); | ||
clientIdInput.type = 'text'; | ||
clientIdInput.className = 'settings-input'; | ||
clientIdInput.value = currentEvaluationConfig.clientId || 'Auto-generated on first connection'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The fallback text 'Auto-generated on first connection' should be extracted to a constant or the UIStrings object for better maintainability and potential internationalization.
clientIdInput.value = currentEvaluationConfig.clientId || 'Auto-generated on first connection'; | |
clientIdInput.value = currentEvaluationConfig.clientId || i18nString(UIStrings.autoGeneratedOnFirstConnection); |
Copilot uses AI. Check for mistakes.
}); | ||
// For new clients, the server created the config and asks to reconnect | ||
// We can attempt to reconnect after a short delay | ||
setTimeout(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The hardcoded 1000ms delay for reconnection retry should be extracted to a constant or configuration parameter for better maintainability and testability.
Copilot uses AI. Check for mistakes.
timestamp: new Date().toISOString() | ||
}); | ||
} | ||
}, 30000); // Send ping every 30 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The hardcoded 30000ms (30 seconds) heartbeat interval should be extracted to a constant or configuration parameter for better maintainability.
}, 30000); // Send ping every 30 seconds | |
}, HEARTBEAT_INTERVAL_MS); // Send ping every 30 seconds |
Copilot uses AI. Check for mistakes.
} | ||
|
||
this.currentReconnectAttempt++; | ||
const delay = this.reconnectDelay * Math.pow(2, this.currentReconnectAttempt - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exponential backoff calculation could result in very large delays for higher retry attempts. Consider adding a maximum delay cap to prevent excessively long wait times.
const delay = this.reconnectDelay * Math.pow(2, this.currentReconnectAttempt - 1); | |
const MAX_RECONNECT_DELAY = 30000; // Maximum delay cap in milliseconds (30 seconds) | |
const delay = Math.min(this.reconnectDelay * Math.pow(2, this.currentReconnectAttempt - 1), MAX_RECONNECT_DELAY); |
Copilot uses AI. Check for mistakes.
// Auto-create new client configuration | ||
try { | ||
logger.info('Auto-creating new client configuration', { clientId }); | ||
await this.clientManager.createClientWithId(clientId, `DevTools Client ${clientId.substring(0, 8)}`, 'hello'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded default secret key 'hello' poses a security risk. Consider generating a strong random secret key or requiring explicit secret key configuration for new clients.
await this.clientManager.createClientWithId(clientId, `DevTools Client ${clientId.substring(0, 8)}`, 'hello'); | |
const randomSecretKey = crypto.randomBytes(32).toString('hex'); | |
await this.clientManager.createClientWithId(clientId, `DevTools Client ${clientId.substring(0, 8)}`, randomSecretKey); |
Copilot uses AI. Check for mistakes.
// Don't automatically start evaluations - wait for manual trigger | ||
// this.processClientEvaluations(connection.clientId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This commented-out line and its explanation suggest incomplete functionality. Either implement the automatic evaluation feature or remove the commented code and explanation to avoid confusion.
// Don't automatically start evaluations - wait for manual trigger | |
// this.processClientEvaluations(connection.clientId); | |
// Evaluations must be manually triggered. |
Copilot uses AI. Check for mistakes.
No description provided.